Improving Inference Throu Clustering
نویسنده
چکیده
Conceptual clustering is an important way to summarize data in an understandable manner. However, the recency of the conceptual clustering paradigm has allowed little exploration of conceptual clustering as a means of improving performance. This paper presents COBWEB, a conceptual clustering system that organizes data to maximize inference abilities. It does this by capturing attribute inter-correlations at classification tree nodes and generating inferences as a by-product of classification. Results from the domains of soybean and thyroid disease diagnosis support the success of this approach. lUachine learning is concerned with improving performance through automated knowledge acquisition and refinement [Dietterich, 19821. Learning filters and incorporates environmental observations into a knowledge base that is used to facilitate performance at some task. Assumptions about the environnnent, knowledge base, and performance task all have important ramifications on the design of a learning algorithm. This paper is concerned with conceptual clustering, a task of machine learning that has not been traditionally discussed in the larger context of intelligent processing. Conceptual clustering systems [Michalski and Stepp, 1983; Fisher, 1985; Cheng and Fu, 19851 accept a number of object descriptions (events, observations, facts), and produce a classification scheme over the observed objects. Importantly, conceptual clustering methods do not require the guidance of a teacher to direct the formation of the classification (as with learning bm examples), but use an evaluation function to discover classes with good conceptual description. These evaluation functions generally favor classes exhibiting many differences between objects of different classes, and few differences between objects of the same class. As with other forms of learning, the context surrounding the conceptual clustering task can have important implications on the design of these systems. Perhaps the most important contextual factor surrounding clustering is the performance task that benefits from conceptual clustering capabilities. While most systems do not explicitly address this task, exceptions do exist. In particular, Cheng and l?u [1985] and l?u and Buchanan [1985] have used clustering techniques to organize expert system knowledge. Generalizing on their use of conceptual clustering, classifications produced by conceptual clustering systems can be a basis for effective inference of unseen object properties. The generality of classification as a means of guiding inference is manifest in recent discussions of problem-solving as classification [Clancey, 19841. This paper describes the COBWEB system for conceptual clustering. COBWEB’s design was motivated by both environmental and performance concerns. Bowever, this paper is primarily concerned with performance issues in particular, with the utility of COBWEB classification trees to facilitate inference during classification.’ The following section motivates and develops an evaluation function used by COBWEB to guide class and concept formation. This measure, called category utility [Gluck and Corter, 19851, favors classes that maximize the amount of information that can be inferred from knowledge of class membership. Section 3 describes the COBWEB algorithm. The remainder of the paper focuses on the utility of COBWEB generated classification trees for inference, concentrating particularly on soybean disease diagnosis. COBWEB uses a measure of concept quality called category utility [Gluck and Corter, 19851 to guide formation of object classes and concepts. While our primary interest in category utility is that it favors classes that maximize inference ability, Gluck and Corter originally derived category utility as 8 means of predicting certain effects observed during human classification. These effects stem from a psychological construct called the b&c keel that occurs in hierarchical classification schemes and seems to be where inference abilities are maximized. ‘COBWEB is ralso distinguished from other systems in that it is incremental. Issues surrounding COBWEB’s performance as an incremental system are given in [Fisher, 19871. From: AAAI-87 Proceedings. Copyright ©1987, AAAI (www.aaai.org). All rights reserved.
منابع مشابه
خوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملNew Approach for Customer Clustering by Integrating the LRFM Model and Fuzzy Inference System
This study aimed at providing a systematic method to analyze the characteristics of customers’ purchasing behavior in order to improve the performance of customer relationship management system. For this purpose, the improved model of LRFM (including Length, Recency, Frequency, and Monetary indices) was utilized which is now a more common model than the basic RFM model apt for analyzing the cus...
متن کاملPrediction of slope stability using adaptive neuro-fuzzy inference system based on clustering methods
Slope stability analysis is an enduring research topic in the engineering and academic sectors. Accurate prediction of the factor of safety (FOS) of slopes, their stability, and their performance is not an easy task. In this work, the adaptive neuro-fuzzy inference system (ANFIS) was utilized to build an estimation model for the prediction of FOS. Three ANFIS models were implemented including g...
متن کاملMulti-Output Adaptive Neuro-Fuzzy Inference System for Prediction of Dissolved Metal Levels in Acid Rock Drainage: a Case Study
Pyrite oxidation, Acid Rock Drainage (ARD) generation, and associated release and transport of toxic metals are a major environmental concern for the mining industry. Estimation of the metal loading in ARD is a major task in developing an appropriate remediation strategy. In this study, an expert system, the Multi-Output Adaptive Neuro-Fuzzy Inference System (MANFIS), was used for estimation of...
متن کاملADAPTIVE NEURO FUZZY INFERENCE SYSTEM BASED ON FUZZY C–MEANS CLUSTERING ALGORITHM, A TECHNIQUE FOR ESTIMATION OF TBM PENETRATION RATE
The tunnel boring machine (TBM) penetration rate estimation is one of the crucial and complex tasks encountered frequently to excavate the mechanical tunnels. Estimating the machine penetration rate may reduce the risks related to high capital costs typical for excavation operation. Thus establishing a relationship between rock properties and TBM pe...
متن کامل